A U -statistic estimator for the variance of resampling-based error estimators
نویسندگان
چکیده
We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Therefore, several standard theorems on properties of U-statistics apply. In particular, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions. In an application to tuning parameter choice in lasso regression on a gene expression data set, the test does not reject the null hypothesis of equal rates between two different parameters. Unbiased Estimator; Penalized Regression Model; U-Statistic; Cross-Validation; Machine Learning;
منابع مشابه
A Comparison of Output-analysis Methods for Simulations of Processes with Multiple Regeneration Sequences
We compare several simulation estimators for a performance measure of a process having multiple regeneration sequences. We examine the setting of two regeneration sequences. We compare two existing estimators, the permuted estimator and the semi-regenerative estimator, and two new estimators, a type of U -statistic estimator and a type of V -statistic estimator. The last two estimators are obta...
متن کاملA variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs
The mean prediction error of a classification or regression procedure can be estimated using resampling designs such as the cross-validation design. We decompose the variance of such an estimator associated with an arbitrary resampling procedure into a small linear combination of covariances between elementary estimators, each of which is a regular parameter as described in the theory of U-stat...
متن کاملThe Ratio-type Estimators of Variance with Minimum Average Square Error
The ratio-type estimators have been introduced for estimating the mean and total population, but in recent years based on the ratio methods several estimators for population variance have been proposed. In this paper two families of estimators have been suggested and their approximation mean square error (MSE) have been developed. In addition, the efficiency of these variance estimators are com...
متن کاملEstimating the error variance in nonparametric regression by a covariate-matched U-statistic
For nonparametric regression models with fixed and random design, two classes of estimators for the error variance have been introduced: second sample moments based on residuals from a nonparametric fit, and difference-based estimators. The former are asymptotically optimal but require estimating the regression function; the latter are simple but have larger asymptotic variance. For nonparametr...
متن کاملNonparametric Inference Relative Errors of Difference-Based Variance Estimators in Nonparametric Regression
Difference-based estimators for the error variance are popular since they do not require the estimation of the mean function. Unlike most existing difference-based estimators, new estimators proposed by Müller et al. (2003) and Tong and Wang (2005) achieved the asymptotic optimal rate as residual-based estimators. In this article, we study the relative errors of these difference-based estimator...
متن کامل